INDEX
Explanations
phrases related to knowledge or information
phrases expressing knowledge or understanding
New Auto-Interp
Negative Logits
pex
-0.85
ishable
-0.76
itter
-0.75
ovie
-0.74
gencies
-0.67
isco
-0.66
phrine
-0.66
otion
-0.64
atism
-0.64
cers
-0.63
POSITIVE LOGITS
lege
1.22
ledge
1.05
ledged
0.83
ABOUT
0.81
firsthand
0.79
nothing
0.73
LED
0.73
urst
0.72
âĦ¢:
0.71
anecd
0.69
Activations Density 0.051%