INDEX
Explanations
punctuation marks, particularly various quotation marks or apostrophes used in speech or dialogue
New Auto-Interp
Negative Logits
becauſe
-0.61
kvinder
-0.60
respectively
-0.60
bolig
-0.59
problemer
-0.58
nemlig
-0.58
applicazioni
-0.58
dramatist
-0.58
stället
-0.57
arbeid
-0.57
POSITIVE LOGITS
Autoritní
0.87
normal
0.69
real
0.66
normal
0.66
hood
0.63
ifs
0.63
hot
0.62
real
0.62
jadx
0.61
hard
0.61
Activations Density 0.158%