INDEX
Explanations
frequent mentions of introductory phrases
New Auto-Interp
Negative Logits
клопе
-0.68
alate
-0.58
niques
-0.58
midt
-0.55
hava
-0.55
omla
-0.54
entration
-0.54
tivation
-0.54
ztes
-0.54
large
-0.53
POSITIVE LOGITS
course
1.05
course
0.80
COURSE
0.80
awtextra
0.73
Course
0.65
Course
0.64
tudo
0.59
COURSE
0.59
conseguenza
0.59
EndInit
0.56
Activations Density 0.169%