INDEX
Explanations
expressions of positive sentiment or approval
New Auto-Interp
Negative Logits
autorytatywna
-0.95
ValueStyle
-0.75
OMITBAD
-0.63
iconLine
-0.61
صوتيه
-0.60
tagHelperRunner
-0.58
Personendaten
-0.56
harapkan
-0.56
Soorten
-0.56
oneofs
-0.55
POSITIVE LOGITS
Ce
0.59
Country
0.59
Nice
0.58
Ce
0.57
š
0.57
ce
0.56
Kin
0.54
CAN
0.53
Man
0.52
liked
0.52
Activations Density 0.767%