INDEX
Explanations
complex discussions about race and cultural identity
statements about gameplay experiences
questioning and speculative phrases
New Auto-Interp
Negative Logits
fabulous
-0.61
Whew
-0.57
!
-0.55
terrific
-0.52
lovely
-0.51
wonderful
-0.51
とっても
-0.50
fabulous
-0.49
なかなか
-0.48
!).
-0.47
POSITIVE LOGITS
/=
0.92
objectively
0.91
Lmao
0.81
lmao
0.78
subjective
0.76
argumento
0.75
idk
0.74
Idk
0.74
Referències
0.73
Idk
0.73
Activations Density 0.134%