INDEX
Explanations
words related to political figures and events
New Auto-Interp
Negative Logits
KB
-0.88
çİĭ
-0.79
MB
-0.79
ç¥ŀ
-0.78
AGES
-0.74
670
-0.72
346
-0.71
Gear
-0.70
Bohem
-0.70
650
-0.69
POSITIVE LOGITS
re
1.36
rez
1.12
RE
1.12
reb
1.10
arre
0.99
rey
0.98
reys
0.97
ère
0.97
rem
0.96
res
0.93
Activations Density 0.134%