INDEX
Explanations
instances of the word "Ir" and variations of it, highlighting a particular name or identity
New Auto-Interp
Negative Logits
illet
-0.21
ilded
-0.20
evil
-0.16
omic
-0.16
icious
-0.16
CRY
-0.15
lý
-0.15
abet
-0.15
ears
-0.15
alth
-0.15
POSITIVE LOGITS
replace
0.23
regular
0.22
ritable
0.21
ving
0.21
Ir
0.20
win
0.20
ises
0.19
relevant
0.19
ration
0.19
ir
0.19
Activations Density 0.016%