INDEX
Explanations
short phrases or sentences expressing physical actions or states
negative comments or criticisms about individuals
New Auto-Interp
Negative Logits
iosyncr
-0.67
uates
-0.66
osponsors
-0.66
SpaceEngineers
-0.65
ongoing
-0.61
conduc
-0.60
erenn
-0.58
coincides
-0.57
Flavoring
-0.56
preliminary
-0.56
POSITIVE LOGITS
he
1.20
He
1.20
Was
1.10
Didn
1.08
He
1.07
didn
1.06
he
1.02
didnt
1.02
Had
1.02
his
0.98
Activations Density 0.537%