INDEX
Explanations
pronouns and verbs related to interaction
references to collective actions or experiences involving groups of people
New Auto-Interp
Negative Logits
Upper
-0.62
Banking
-0.58
Additional
-0.58
cade
-0.57
Joint
-0.55
bender
-0.55
contrasting
-0.54
extant
-0.54
:(
-0.54
ormal
-0.54
POSITIVE LOGITS
're
1.02
'll
1.02
've
0.96
ain
0.91
don
0.83
'm
0.79
wanna
0.78
rises
0.77
athered
0.74
areth
0.73
Activations Density 0.299%