INDEX
Explanations
words related to historical figures and events
names of historical figures and events
New Auto-Interp
Negative Logits
Reviewer
-0.97
rongh
-0.70
yip
-0.65
notor
-0.62
pestic
-0.59
metic
-0.58
eatures
-0.57
merits
-0.57
recounts
-0.57
inding
-0.57
POSITIVE LOGITS
..............
1.09
0
1.04
??
0.94
1
0.89
........
0.88
31
0.88
2
0.87
00000000
0.87
..........
0.86
37
0.86
Activations Density 0.199%