INDEX
Explanations
phrases related to historical events or political ideologies
New Auto-Interp
Negative Logits
é¾įå¥ij士
-0.80
Ö¼
-0.73
mel
-0.73
sburgh
-0.70
baugh
-0.70
avid
-0.69
ensible
-0.67
ãĥ¯ãĥ³
-0.61
markers
-0.61
rored
-0.61
POSITIVE LOGITS
zzi
1.04
ÄŁ
1.00
qua
0.95
Paulo
0.91
ji
0.91
zeb
0.91
vernment
0.90
qi
0.87
zu
0.86
Zed
0.86
Activations Density 0.025%