INDEX
Explanations
texts with specific formatting quirks
conversational prompts or questions about experiences and opinions
New Auto-Interp
Negative Logits
conom
-0.65
acan
-0.62
lifes
-0.61
Tinder
-0.60
Uran
-0.59
gunned
-0.57
upstairs
-0.57
fleeing
-0.57
crosses
-0.57
uno
-0.57
POSITIVE LOGITS
Answer
1.07
YES
0.77
OVA
0.75
Yes
0.75
Correct
0.74
Interview
0.73
Yeah
0.72
RM
0.72
gow
0.71
³³³³³³³³
0.71
Activations Density 0.105%