INDEX
Explanations
forms of the verb "to be."
New Auto-Interp
Negative Logits
totality
-0.80
quo
-0.77
tens
-0.74
societies
-0.73
acknow
-0.73
directions
-0.72
situations
-0.72
dart
-0.71
areas
-0.71
flare
-0.71
POSITIVE LOGITS
Wrong
1.05
Them
1.02
Us
0.99
Stupid
0.99
Killed
0.98
Possible
0.98
Hate
0.98
Own
0.97
Changed
0.96
Hits
0.93
Activations Density 0.057%