INDEX
Explanations
foreign language characters or symbols
special characters or symbols in the text
New Auto-Interp
Negative Logits
atis
-0.86
ĸļ
-0.80
selage
-0.79
unciation
-0.79
hof
-0.78
orno
-0.76
onom
-0.75
ortmund
-0.74
oris
-0.74
orius
-0.71
POSITIVE LOGITS
dating
1.03
coming
0.84
stairs
0.81
ban
0.80
ward
0.79
bone
0.77
ded
0.76
side
0.76
lishes
0.76
dit
0.75
Activations Density 0.008%