INDEX
Explanations
mentions of political figures, organizations, and events
specific patterns or combinations of letters that may indicate proper nouns or identifiers
New Auto-Interp
Negative Logits
ãĥĩãĤ£
-0.79
ngth
-0.74
ãĥ¢
-0.71
ãĤ¼ãĤ¦ãĤ¹
-0.67
ãĤ®
-0.66
ãĤ¢ãĥ«
-0.65
ãĥī
-0.65
£ı
-0.65
ufact
-0.64
gage
-0.63
POSITIVE LOGITS
tip
0.75
orah
0.69
uit
0.67
arma
0.67
Tip
0.67
bush
0.66
spir
0.66
atron
0.66
RP
0.65
ayn
0.65
Activations Density 0.077%