INDEX
Explanations
references to individuals who identify as beginners in various contexts
New Auto-Interp
Negative Logits
ontale
-0.59
lanniksi
-0.54
chì
-0.52
rzez
-0.52
lunches
-0.51
erobe
-0.51
那就是
-0.50
ligently
-0.50
avits
-0.49
Скачать
-0.49
POSITIVE LOGITS
novice
0.84
unfamiliar
0.84
beginner
0.80
newcomer
0.79
Anfänger
0.78
newcomers
0.78
beginners
0.74
Beginner
0.73
Novice
0.73
newbies
0.73
Activations Density 0.182%