INDEX
Explanations
mentions of a specific anime-related term
references to specific movie titles or related terminology
New Auto-Interp
Negative Logits
Kurdistan
-0.67
Lyme
-0.62
neighbors
-0.61
Corpus
-0.60
Raqqa
-0.60
Reporting
-0.60
Islamabad
-0.59
depreciation
-0.59
reporting
-0.58
Certification
-0.58
POSITIVE LOGITS
imo
4.10
atto
1.85
aye
1.06
imus
1.04
imum
0.96
uo
0.91
lio
0.90
aro
0.88
andro
0.88
angelo
0.87
Activations Density 0.027%