INDEX
Explanations
phrases emphasizing the importance or significance of concepts or elements
New Auto-Interp
Negative Logits
aikaa
-0.71
umą
-0.69
pleaſure
-0.64
AppColors
-0.64
varandra
-0.62
WebControls
-0.61
nyingi
-0.61
enemmän
-0.60
känd
-0.60
roxene
-0.60
POSITIVE LOGITS
same
0.95
same
0.88
very
0.80
VERY
0.73
very
0.73
SAME
0.70
Very
0.70
самого
0.69
Very
0.69
самых
0.69
Activations Density 0.073%