INDEX
Explanations
phrases indicating separation or distance
the concept of separation or division
New Auto-Interp
Negative Logits
occ
-0.66
activated
-0.64
DAY
-0.60
reens
-0.58
millenn
-0.57
[+
-0.56
Cards
-0.56
Gos
-0.56
livest
-0.56
car
-0.55
POSITIVE LOGITS
heid
1.03
ĸļ
0.86
igious
0.86
inguished
0.85
sexes
0.75
ellig
0.75
ively
0.75
apart
0.73
urbed
0.73
chool
0.70
Activations Density 0.008%