INDEX
Explanations
phrases related to specific details or events
expressions related to discomfort and contentious social issues
New Auto-Interp
Negative Logits
sbm
-0.67
/
-0.65
umbn
-0.64
20439
-0.62
arthed
-0.61
Seym
-0.60
epend
-0.60
ãĢij
-0.58
];
-0.58
escription
-0.57
POSITIVE LOGITS
?!
1.79
!?
1.65
huh
1.63
?
1.62
???
1.50
??
1.48
...?
1.47
.?
1.42
?!"
1.37
????
1.36
Activations Density 0.903%