INDEX
Explanations
names of individuals or places
unknown or unspecified entities or terms
New Auto-Interp
Negative Logits
voy
-0.73
BOOK
-0.67
expressive
-0.64
pains
-0.62
unintended
-0.61
practicable
-0.58
uve
-0.58
APH
-0.58
resemb
-0.57
disobedience
-0.57
POSITIVE LOGITS
buster
1.14
geon
1.01
irk
0.98
rat
0.97
ett
0.91
regate
0.88
ernel
0.88
etsu
0.87
adel
0.87
iewicz
0.86
Activations Density 0.034%