INDEX
Explanations
words related to perception, discernment, and inference
terms related to understanding and perception
New Auto-Interp
Negative Logits
href
-0.69
hyde
-0.64
zilla
-0.63
Mehran
-0.61
fixture
-0.60
Medicare
-0.58
sche
-0.58
bish
-0.57
wife
-0.57
ammy
-0.57
POSITIVE LOGITS
ibly
0.98
ible
0.98
iour
0.86
ãĤ¨ãĥ«
0.82
ibles
0.79
orial
0.78
wered
0.77
ibility
0.74
MENTS
0.74
Curve
0.73
Activations Density 0.032%