INDEX
Explanations
expressions of honesty and candidness
New Auto-Interp
Negative Logits
kapturem
-0.51
miniaturka
-0.50
deseo
-0.50
recours
-0.49
outcomes
-0.48
Sünde
-0.47
Vergnügen
-0.47
dragón
-0.47
Gnade
-0.46
izante
-0.45
POSITIVE LOGITS
probably
0.89
really
0.78
barely
0.77
honestly
0.74
prolly
0.71
probable
0.70
probably
0.70
Probably
0.68
Really
0.67
Probably
0.66
Activations Density 0.292%