INDEX
Explanations
words related to specific individuals or entities
prominent names and specific numerical references
New Auto-Interp
Negative Logits
igate
-0.76
______
-0.71
Victoria
-0.66
ORK
-0.64
tnc
-0.64
$$$$
-0.63
)",
-0.62
igm
-0.62
elfare
-0.61
;;;;
-0.61
POSITIVE LOGITS
nonetheless
1.41
nevertheless
1.13
etheless
1.08
persists
1.02
retains
0.95
persisted
0.92
insists
0.86
lacks
0.84
hasn
0.80
disagrees
0.79
Activations Density 0.469%