INDEX
Explanations
concepts related to openness and acceptance
New Auto-Interp
Negative Logits
فريبيس
-0.54
таратура
-0.50
Execution
-0.48
execution
-0.45
Stopwatch
-0.45
kasarigan
-0.44
executions
-0.43
guantes
-0.43
tierrez
-0.43
Watched
-0.42
POSITIVE LOGITS
openness
0.77
open
0.77
opened
0.71
open
0.70
Open
0.66
terbuka
0.66
OPEN
0.66
ouverte
0.65
opens
0.64
ouvert
0.64
Activations Density 0.055%