INDEX
Explanations
the word "who", indicating a focus on individuals or subjects associated with actions or characteristics
New Auto-Interp
Negative Logits
fair
-0.16
686
-0.16
wan
-0.14
scribe
-0.14
deo
-0.13
frog
-0.13
arna
-0.13
договоÑĢ
-0.13
caravan
-0.13
medio
-0.13
POSITIVE LOGITS
eric
0.16
anton
0.15
ombat
0.15
hoff
0.15
oya
0.15
νοÏį
0.14
ieten
0.13
OURCE
0.13
stadt
0.13
rend
0.13
Activations Density 0.026%