INDEX
Explanations
references to war and related concepts
New Auto-Interp
Negative Logits
purpoſe
-0.86
ویکیپدیای
-0.83
faſt
-0.82
rechange
-0.80
Monfieur
-0.78
pitié
-0.78
pleaſure
-0.76
uſ
-0.76
ſtate
-0.75
tranſ
-0.73
POSITIVE LOGITS
war
1.01
War
0.79
wars
0.66
tables
0.66
Tab
0.61
Wars
0.61
War
0.61
gen
0.60
tab
0.59
tably
0.59
Activations Density 0.168%