INDEX
Explanations
questions about individuals and their relationships, particularly in social or investigative contexts
New Auto-Interp
Negative Logits
074
-0.15
ì²
-0.14
089
-0.14
071
-0.14
Äįný
-0.14
erie
-0.14
atcher
-0.14
088
-0.14
quit
-0.14
973
-0.14
POSITIVE LOGITS
åıĬåħ¶
0.16
obi
0.16
serter
0.16
än
0.14
mlink
0.14
ormsg
0.14
ĶĦ
0.14
SETS
0.14
uby
0.14
ä»ĺãģį
0.14
Activations Density 0.284%