INDEX
Explanations
mentions of specific names
references to influential individuals and their actions or statements
New Auto-Interp
Negative Logits
IENCE
-0.61
ufact
-0.59
vironment
-0.59
éĹ
-0.57
naissance
-0.56
ucket
-0.56
ishable
-0.55
ãĤ¼
-0.54
issance
-0.53
erity
-0.51
POSITIVE LOGITS
pole
0.59
endif
0.58
alion
0.56
knots
0.54
supra
0.53
detach
0.52
ridge
0.52
issan
0.52
Ct
0.52
oise
0.51
Activations Density 1.490%