INDEX
Explanations
words related to magnetic forces and specific names like "Magnet" and "Xavier"
references to the character Magneto and the concept of magnetism
New Auto-Interp
Negative Logits
ãģį
-0.92
OWS
-0.78
Citiz
-0.67
à©
-0.64
ORD
-0.63
UD
-0.61
Applicant
-0.58
center
-0.56
razen
-0.56
thought
-0.56
POSITIVE LOGITS
ophon
1.03
stadt
0.99
eers
0.98
xon
0.97
ical
0.95
istical
0.94
bol
0.92
eer
0.90
hester
0.89
ertodd
0.87
Activations Density 0.052%