INDEX
Explanations
questions and statements about identity and self-perception
New Auto-Interp
Negative Logits
pleaſure
-0.73
fevere
-0.69
دانشنامهٔ
-0.67
fuper
-0.62
fhort
-0.61
―――――
-0.61
poffible
-0.60
oredCriteria
-0.60
againſt
-0.60
neceff
-0.59
POSITIVE LOGITS
pezi
0.55
KURZBESCHREIBUNG
0.55
bodyParser
0.51
artifactId
0.51
State
0.50
+:+
0.49
gdala
0.48
تقاوى
0.46
\{\\0.46
lude
0.46
Activations Density 0.218%