INDEX
Explanations
film-related entities such as movie titles, actors, and production information
mentions of specific names, places, and proper nouns
New Auto-Interp
Negative Logits
tooth
-0.71
accelerated
-0.71
forgiven
-0.69
reintrodu
-0.66
worrying
-0.65
ãĢİ
-0.64
tightening
-0.64
perfect
-0.64
implementing
-0.64
setting
-0.63
POSITIVE LOGITS
(@
0.95
cerpt
0.87
GoldMagikarp
0.86
anan
0.80
maxwell
0.77
Wolf
0.77
idav
0.77
ascript
0.76
Anonymous
0.76
veyard
0.75
Activations Density 0.281%