INDEX
Explanations
text written in a language with special characters and accents
instances of a specific character or symbol related to dialogue or subjects in text
New Auto-Interp
Negative Logits
Hodg
-0.67
Jericho
-0.63
patched
-0.63
ORED
-0.63
kernels
-0.62
Mayweather
-0.60
immune
-0.59
Iro
-0.58
Asians
-0.58
proxies
-0.58
POSITIVE LOGITS
inen
1.39
nder
1.22
ä
1.15
nen
1.15
tten
1.11
ternity
1.04
¢
1.00
ng
0.98
ki
0.96
lde
0.95
Activations Density 0.014%