INDEX
Explanations
mentions of Bollywood films and related content
New Auto-Interp
Negative Logits
untu
-0.15
ackers
-0.15
bane
-0.14
ourt
-0.14
Roosevelt
-0.14
inoa
-0.14
elah
-0.14
mund
-0.14
Ì£
-0.13
Cleaner
-0.13
POSITIVE LOGITS
gee
0.18
ÑĢÑİ
0.16
ubits
0.16
ue
0.15
IFE
0.15
αν
0.14
",__
0.14
èĻ
0.14
oit
0.14
ÙĪÙĬØ©
0.14
Activations Density 0.001%