INDEX
Explanations
names related to individuals or locations, particularly ones with the word "Far" in them
occurrences of the name "Far."
New Auto-Interp
Negative Logits
sburgh
-0.77
essee
-0.76
vironment
-0.74
ettings
-0.74
xual
-0.69
代
-0.64
Peel
-0.64
KY
-0.62
ptive
-0.62
ital
-0.62
POSITIVE LOGITS
Far
0.95
Far
0.92
aday
0.85
bent
0.85
rug
0.83
away
0.81
az
0.80
agher
0.79
ouk
0.78
far
0.78
Activations Density 0.003%