INDEX
Explanations
mentions or references to the letter 'G'
New Auto-Interp
Negative Logits
ạp
-0.18
apid
-0.16
ARIANT
-0.15
PPER
-0.15
latable
-0.14
æĮĩ导
-0.14
ADDE
-0.14
èo
-0.14
ishment
-0.14
737
-0.14
POSITIVE LOGITS
iron
0.30
onz
0.26
arc
0.23
ij
0.23
avir
0.22
IRON
0.21
erman
0.21
erm
0.21
elsen
0.21
uti
0.20
Activations Density 0.022%