INDEX
Explanations
words related to language or location, with a focus on specific languages or countries
occurrences of certain non-English or special characters in the text
New Auto-Interp
Negative Logits
hyde
-0.86
aged
-0.71
agall
-0.71
ipolar
-0.68
ammy
-0.67
ucket
-0.66
abase
-0.65
ahar
-0.65
aging
-0.65
ngth
-0.61
POSITIVE LOGITS
ãĤī
1.00
ת
0.96
×Ļ×
0.95
IJ
0.93
κ
0.92
׾
0.90
ä
0.85
×ķ
0.85
ä¸ī
0.85
æĢ
0.80
Activations Density 0.033%