INDEX
Explanations
periods at the end of sentences
sentences that contain references to the United States
New Auto-Interp
Negative Logits
illac
-0.63
ãĥ¼ãĥĨãĤ£
-0.60
cruc
-0.59
Mouse
-0.58
Rabbit
-0.56
ribbon
-0.56
tur
-0.55
pandemonium
-0.55
ãĤ¤
-0.55
fined
-0.55
POSITIVE LOGITS
S
1.35
N
1.20
K
0.94
$.
0.93
NAT
0.91
Nations
0.89
NER
0.87
SI
0.86
Ns
0.84
zbek
0.80
Activations Density 0.042%