INDEX
Explanations
No Explanations Found
New Auto-Interp
Head Attr Weights
0:0.09
1:0.07
2:0.07
3:0.08
4:0.08
5:0.08
6:0.08
7:0.06
8:0.09
9:0.10
10:0.08
11:0.07
Negative Logits
osis
-3.34
Whe
-2.82
eph
-2.82
por
-2.81
Ach
-2.71
IPS
-2.67
Shawn
-2.67
phe
-2.66
Pont
-2.64
adelphia
-2.63
POSITIVE LOGITS
Stevenson
2.95
utilitarian
2.56
derog
2.52
realism
2.46
xual
2.34
reader
2.33
firefighter
2.32
readers
2.29
deflation
2.29
Tolkien
2.27
Activations Density 0.000%
No Known Activations
This feature has no known activations.