INDEX
Explanations
statements related to opinion or critique
New Auto-Interp
Negative Logits
Bris
-0.95
Raqqa
-0.92
Qiao
-0.91
Reed
-0.90
Brist
-0.86
Coul
-0.86
Laf
-0.82
Cruz
-0.82
Leban
-0.82
Fab
-0.79
POSITIVE LOGITS
game
1.68
games
1.67
Game
1.67
GAME
1.66
games
1.65
game
1.64
Game
1.59
Games
1.58
GAME
1.48
gaming
1.48
Activations Density 0.269%